Phantom results
Phantom results#
An overview of the T1 results for the submitted NIST phantom datasets are displayed in Figure 3. The same data is presented in each column with different axes types (linear, log, and error) to better visualize the results. The left column (a,d) shows the mean T1 with their standard deviations in each of the 14 ROIs are plotted against temperature-corrected reference T1 values using linear axes for a representative dataset (a) and all datasets (d). The middle column (b,e) displays the same mean T1 datasets as (a,d) but using log-log axes. The right column (c,f) displays the error (%) of the measured T1 relative to the temperature-corrected NIST reference values; the dotted lines represent a ±10% error. Figure 3a shows a strong linear trend and slight underestimation (slope = 0.98, intercept = -14 ms) for this dataset in comparison to the reference T1 values. However, this trend breaks down for low T1 values (T1 < 100-200 ms), as seen in the log-log plot (Figure 3b), which was expected because the imaging protocol is optimized for human T1 values (T1 > 500 ms). Errors exceeding 10% are observed for T1 values of phantom spheres below this threshold (Figure 3c). These trends are observed for the entire-dataset plots as well (Figure 3d-f). More variability is seen in Figure 3d around the identity diagonal at very high T1 (T1 ~ 2000 ms) than towards the WM-GM values (T1 ~ 600-1400 ms), which is less apparent in the log-log plot (Figure 3e). In addition to the low T1 values exceeding the 10% error threshold (Figure 3f), a few measurements with outlier values (~3-4) human tissue range were observed in the human tissue range.
Figure 3. Measured mean T1 values vs. temperature-corrected NIST reference values of the phantom spheres presented as linear plots (a,d), log-log plots (b,e), and plots of the error relative to reference T1 value. Plots (a–c) are of an example single dataset, whereas plots (d–f) are of all acquired datasets.
from os import path
if path.isdir('analysis')== False:
!git clone https://github.com/rrsg2020/analysis.git
# Imports
import warnings
warnings.filterwarnings("ignore")
from pathlib import Path
import pandas as pd
import json
import nibabel as nib
import numpy as np
from analysis.src.database import *
from analysis.src.nist import get_reference_NIST_values, get_NIST_ids
from analysis.src.tools import calc_error
from analysis.src.nist import temperature_correction
import matplotlib.pyplot as plt
plt.style.use('analysis/custom_matplotlibrc')
plt.rcParams["figure.figsize"] = (10,10)
fig_id = 0
database_path = Path('analysis/databases/3T_NIST_T1maps_database.pkl')
output_folder = Path("analysis/plots/03_singledataset_scatter_NIST-temperature-corrected/")
estimate_type = 'mean' # median or mean
## Define Functions
def plot_single_scatter(x, y, y_std,
title, x_label, y_label,
file_prefix, folder_path, fig_id,
y_type='linear'):
if y_type is 'linear':
plt.errorbar(x,y, y_std, fmt='o', solid_capstyle='projecting')
ax = plt.gca()
ax.axline((1, 1), slope=1, linestyle='dashed')
ax.set_ylim(ymin=0, ymax=2500)
ax.set_xlim(xmin=0, xmax=2500)
if y_type is 'log':
plt.loglog(x,y,'o')
ax = plt.gca()
ax.set_ylim(ymin=20, ymax=2500)
ax.set_xlim(xmin=20, xmax=2500)
if y_type is 'error_t1':
plt.errorbar(x,calc_error(x,y), fmt='o')
ax = plt.gca()
ax.axline((1, 0), slope=0, color='k')
ax.axline((1, -10), slope=0, linestyle='dashed', color='k')
ax.axline((1, 10), slope=0, linestyle='dashed', color='k')
ax.set_ylim(ymin=-100, ymax=100)
ax.set_xlim(xmin=0, xmax=2500)
plt.title(title)
plt.xlabel(x_label)
plt.ylabel(y_label)
fig = plt.gcf()
folder_path.mkdir(parents=True, exist_ok=True)
if fig_id<10:
filename = "0" + str(fig_id) + "_" + file_prefix
else:
filename = str(fig_id) + "_" + file_prefix
fig.savefig(folder_path / (str(filename) + '.svg'), facecolor='white')
fig.savefig(folder_path / (str(filename) + '.png'), facecolor='white')
fig_id = fig_id + 1
plt.show()
return fig_id
## Load database
df = pd.read_pickle(database_path)
## Initialize array
dataset_estimate = np.array([])
dataset_std = np.array([])
index = 6.001
serial_number = df.loc[index]['phantom serial number']
for key in get_NIST_ids():
if estimate_type == 'mean':
dataset_estimate = np.append(dataset_estimate, np.mean(df.loc[index][key]))
elif estimate_type == 'median':
dataset_estimate = np.append(dataset_estimate, np.median(df.loc[index][key]))
else:
Exception('Unsupported dataset estimate type.')
dataset_std = np.append(dataset_std, np.std(df.loc[index][key]))
ref_values = get_reference_NIST_values(serial_number)
temperature = df.loc[index]['phantom temperature']
temp_corrected_ref_values = temperature_correction(temperature, serial_number)
output_folder = Path("analysis/plots/04_alldatasets_scatter_NIST-temperature-corrected/")
## Initialize array
dataset_mean = np.zeros((1,14))
dataset_std = np.zeros((1,14))
version = np.array([])
temperature = np.array([])
ref_values = np.zeros((1,14))
ii=0
for index, row in df.iterrows():
if type(df.loc[index]['T1 - NIST sphere 1']) is np.ndarray:
version = np.append(version,df.loc[index]['phantom serial number'])
temperature = np.append(temperature, df.loc[index]['phantom temperature'])
if version[ii] is None:
version[ii] = 999 # Missing version, only known case is one where we have version > 42 right now.
if temperature[ii] is None:
temperature[ii] = 20 # Missing temperature, assume it to be 20C (reference temperature).
if ii==0:
ref_values = get_reference_NIST_values(version[ii])
temp_corrected_ref_values = temperature_correction(temperature[ii], version[ii])
else:
ref_values = np.vstack((ref_values, get_reference_NIST_values(version[ii])))
temp_corrected_ref_values = np.vstack((temp_corrected_ref_values, temperature_correction(temperature[ii], version[ii])))
tmp_dataset_estimate = np.array([])
tmp_dataset_std = np.array([])
for key in get_NIST_ids():
if estimate_type is 'mean':
tmp_dataset_estimate = np.append(tmp_dataset_estimate, np.mean(df.loc[index][key]))
elif estimate_type is 'median':
tmp_dataset_estimate = np.append(tmp_dataset_estimate, np.median(df.loc[index][key]))
else:
Exception('Unsupported dataset estimate type.')
tmp_dataset_std = np.append(tmp_dataset_std, np.std(df.loc[index][key]))
if ii==0:
dataset_estimate = tmp_dataset_estimate
dataset_std = tmp_dataset_std
else:
dataset_estimate = np.vstack((dataset_estimate, tmp_dataset_estimate))
dataset_std = np.vstack((dataset_std, tmp_dataset_std))
ii=ii+1
## Setup for plots
fig_id = 0
dims=ref_values.shape
file_prefix = 'alldatasets'
Inter-participant coefficient of variations (COV) were calculated by selecting one single T1 map submitted per challenge participant and calculating the COV of the T1 means per sphere. The average inter-participant COV across the first five spheres representing the expected range in the human brain was 6.1 % (sphere 1 = 4.7 %, sphere 2 = 3.1 %, sphere 3 = 6.3 %, sphere 4 = 12.8 %, sphere 5 = 7.3 %). Two sites were clear outliers that had particular issues for sphere 4, likely due to a combination of an implementation error and a resulting uncertainty of where the signal null lies for his four-TI measurement at that T1 value; by removing these outliers, the mean inter-participant COV reduces to 4.1 % (sphere 1 = 5.4 %, sphere 2 = 3. 5%, sphere 3 = 2.5 %, sphere 4 = 4.2 %, sphere 5 = 4.9 %). One participant measured T1 maps with one phantom using one implemented protocol at 7 different sites using a single manufacturer, and so a mean intra-participant COV across the first five spheres for this case was calculated to be 2.9 % (sphere 1 = 4.9 %, sphere 2 = 3.5 %, sphere 3 = 2.6 %, sphere 4 = 2.0 %, sphere 5 = 1.6 %).
Figure 4. Scatter plot comparing complex and magnitude-only fitted data. The markers are color-coded based on the implementation site, while their size represents the difference (annotated for scale) between two cases of T1 estimations for each sphere (from 1 to 7).
Figure 4 compared the mean T1 values measured using complex and magnitude-only data for the 11 datasets where authors provided both in their submissions. Note that these datasets are from the same acquisition, not two unique acquisitions. The scatter plot shows that for the range of T1 values expected in the brain (T1 > 500 ms), there is almost no difference in fitted T1 values between the two types of data (the highest outlier indicates ~9ms difference). However, for T1 values less than ~250 ms, there are large errors (please see the dashboard), which are likely due to poor fitting using a protocol that is not optimized for this range of T1 values.
Figure 5. Hierarchical shift function analysis comparing T1 estimation error throughout the entire range of voxel-wise distributions split into quantiles at 9 intervals (q1-q9, where q5 corresponds to median difference) in spheres 1-6, for 20 measurements. Each panel displays individual shift functions for each measurement (colored by vendor) in the top row, quantifying (overestimation if above the zero-crossing, underestimation otherwise) and characterizing (e.g., straight lines indicate a homogeneous measurement error across voxels) the percent measurement error. The bottom row in each panel (gray markers) shows the average trend of bootstrapped differences at each decile in milliseconds (e.g., a 39ms median (q5) underestimation trend in Sphere 3). High-density intervals not intersecting the zero crossing indicate a notable common trend at the respective decile.
The direction of the measurement error in the phantom is influenced by both the measurement site and the reference value, as indicated by the individual shift functions (Figure 5). For example, at sphere 1 (~2000 ms), nearly half of the measurements (20 shown in total) are positioned on each side of the zero-crossing. On the other hand, for sphere 3 (~1s), nearly all the measurements show underestimation as shift functions are located below the zero-crossing. Bootstrapped differences capture these trends, indicating a dominant overestimation at sphere 1 (median difference of +17ms) and underestimation at sphere 3 (median difference of -39ms). High-density intervals associated with these median differences do not indicate a common pattern for the former (intervals cross zero), whereas they reveal a notable underestimation trend at sphere 3 (intervals do not include zero). A similar common pattern is also observed for sphere 2 (median overestimation of 35ms). In addition, the shape of individual shift functions conveys information about how voxel-wise distributions differ. For example, curved lines in sphere 2 from two different sites reveal that some of the (ROI selected) voxels show drastically higher underestimation that cannot be captured by comparisons of central tendency alone. Lastly, the spread of shift functions around the zero-crossing does not indicate vendor-specific clustering for the selected measurements and reference values.